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Abstract 

It is believed that almost all common diseases are the consequence of complex interactions between genetic 
markers and environmental factors. However, few such interactions have been documented to date. Conventional 
statistical methods for detecting gene and environmental interactions are often based on the linear regression 
model, which assumes a linear interaction effect. In this study, we propose a nonparametric partition-based 
approach that is able to capture complex interaction patterns. We apply this method to the real data set of 
hypertension provided by Genetic Analysis Workshop 18. Compared with the linear regression model, the 
proposed approach is able to identify many additional variants with significant gene-environmental interaction 
effects. We further investigate one single-nucleotide polymorphism identified by our method and show that its 
gene-environmental interaction effect is, indeed, nonlinear. To adjust for the family dependence of phenotypes, we 
apply different permutation strategies and investigate their effects on the outcomes. 



Background 

Genome-wide association studies (GWAS) have success- 
fully discovered many common variants associated with 
complex diseases, but the single-nucleotide polymorph- 
isms (SNPs) identified so far account for a small propor- 
tion of the total heritability in quantitative traits [1]. 
Increasing evidence shows that gene-environment (GxE) 
interactions are widely involved in the etiology of com- 
plex diseases, including diabetes, cancer, and psychiatric 
disorders [2,3]. The investigation of GxE interactions will 
not only facilitate the identification of novel genes whose 
marginal effects are undetectable, but also provide 
insights into disease etiology and hence greatly benefit 
drug development and personalized therapy. 

The commonly applied methods to detect GxE interac- 
tions are based on linear or logistic regression models 
[4]. In particular, for quantitative outcomes, a linear 
model is considered in the form of 
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y = /8o+ySiG-F;S2£-F;S3Gx£-Fe (1) 

where G is the genotype of a SNP, E is the environ- 
mental factor, e is a normally distributed random error, 
and is the coefficient corresponding to the interaction 
term. If /J3 = 0, the conditional effect of the SNP is con- 
stant across different levels of the environmental factor 
and we conclude that there is no GxE interaction. This 
model assumes a linear interaction effect; given G, the 
outcome y is linearly related with E. However, in prac- 
tice, it is likely that the interaction schemes are more 
complicated so that the linear model will probably fail 
to capture the interaction effect. Therefore, there is a 
pressing need to develop novel statistical approaches for 
genome-wide GxE interaction studies. Here we propose 
a nonparametric partition-based approach to detect GxE 
interactions and conduct a GWAS for hypertension 
using the real data set provided by Genetic Analysis 
Workshop 18 (GAW18). For each SNP, both the hnear 
regression model and the proposed method are used to 
evaluate its interaction effect with each of the 4 environ- 
mental factors: age, gender, smoking status, and medi- 
cine. We note that, compared with the linear model, the 
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proposed method is able to identify many additional 
SNPs. We further study the interaction pattern between 
SNP rsl7206492 and medicine, and find that this inter- 
action effect is, indeed, nonlinear. We also investigate 
different permutation strategies in the presence or 
absence of pedigree dependence of the phenotype. 

Methods 

Data set 

The GAW18 data set consists of GWAS data and whole 
genome sequence data with longitudinal phenotypes for 
hypertension and related traits from Type 2 Diabetes 
Genetic Exploration by Next-generation sequencing in 
Ethnic Samples (T2D-GENES) Project 2. There are 939 
individuals in total, and we include in our analysis only 
the 849 individuals with both phenotype data and 
imputed sequence information. Each individual has mea- 
surements for up to 4 time points. At each visit, systolic 
blood pressure (SBP) and diastohc blood pressure (DBP) 
were measured; covariates including age, use of antihy- 
pertensive medication, and current tobacco smoking sta- 
tus were also recorded. Gender and pedigree are known 
for each subject. Genotypes of odd-numbered chromo- 
somes are provided. In our study, we focused on chro- 
mosome 3 as suggested by the workshop organizer for 
the sake of comparison. Although we had access to the 
answers for the simulated data set, we used only the 
real data set in our analysis. 

A general framework-a partition-based association 
measure 

Suppose there are n independent subjects that can be 
separated by a partition J~[. An association measure 
between the outcome Yand the partition Y\ is defined as: 



n, 



ni {Yi-Y}' 
n Sy^/rii 



(2) 



where n,- is the number of subjects in partition i, Y, is 
the average of the outcome Y for subjects in partition 
and Y and Sy are the mean and variance of Y from all 
subjects. It has been shown that under the null hypoth- 
esis Y\ does not have influence on Y, I asymptotically 
converges to a weighted sum oi x^i distributions [5]. It 
has higher power than linear regression or logistic 
regression models, even in sparse partitions. 

GxE association measure I 

Consider a marker G and an environmental factor E. 
Suppose G has 3 phenotypes, AA, Aa, and aa (A refers 
to the major allele and a the minor allele), coded as 0, 
1, and 2. Suppose E is divided into 3 categories: 0, 1, 
and 2. Hereby G and E together create 9 partitions for 
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all subjects (Table 1). From the general framework in 
the last section, an association measure that evaluates 
the total effect of G and E on the phenotype is: 



3 3 



{n-W 



(3) 



where all the terms are similarly defined as before and 
y denotes the phenotype. The marginal effects of G and 
E can be obtained in a similar fashion: 

/c=i:--%^^^-E^-%^ (4) 



n.. 



The test statistic that measures the GxE interaction 
effect is defined as the difference between the total 
effect and the maximum of the two marginal effects: 



Igxe = It- max{lG, Ie) 



(5) 



The significance of Iq^e is evaluated by the method of 
permutation. 

Permutation strategies 

We consider 3 permutation strategies in our analysis: 
global permutation, local permutation, and residual per- 
mutation. Let denote the phenotype of the indivi- 
dual in the i'^'' pedigree. Global permutation is to 
permute phenotypes over all individuals. For local per- 
mutation, the phenotypes are permuted within each 
pedigree. In residual permutation, we first compute the 
residuals for each individual Sy = yij — y^, where 7, is the 
average phenotype for pedigree i, then permute e,y over 
all subjects to obtain a permuted residual s*^ for each 
individual. The permuted Y values y*j are obtained by 
y*j = fj + e*j. Both local permutation and residual permu- 
tation assume yij = y,- + £,;, where £(£,)) = 0 and Isij] are 
independent. Residual permutation further assumes that 
[sij] have the same distribution. 

Results 

Partitions created by environmental factors 

The real data set from GAW18 contains the records of 
4 environmental factors: age, gender, smoking status. 
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and antihypertensive medication usage (medicine). 
Because gender is a binary variable, it partitions all indi- 
viduals into 2 groups. Although this data set provides 
longitudinal measurements of age, smoking, and medi- 
cine, the records have many missing values (only 187 
subjects have complete measurements for all 4 visits). 
Therefore, for each individual, we summarized these 
covariates by either the averaged value (for age) or the 
sum (for smoking and medicine) across different time 
points from available records and used these summar- 
ized quantities in our analysis. Similarly, averaged SBP 
and averaged DBP were considered as outcomes. Here 
we created 3 partitions by each of age, smoking, and 
medicine (Table 2). 

SNPs with significant GxE interaction effects 

In the GWAS data set provided by GAW18, there are 
62,915 SNPs on chromosome 3. For each SNP, we 
evaluated its interaction effect with each of the 4 
environmental factors on both SBP and DBP using the 
linear regression model (LRM) and the proposed parti- 
tion-based score / (PBI). p Values of LRM were derived 
from the asymptotic distribution of the regression 
coefficient Ps and p values of PBI were computed from 
10^ permutations using global, local, or residual per- 
mutation procedures. Table 3 lists the number of SNPs 
with p values less than the Bonferroni-corrected signif- 
icance level (7.9*10 '^) for all interactions under con- 
sideration. Compared with LRM, PBI identified many 
additional significant SNPs, especially when testing the 
GxE interaction effects with medicine. The reason, we 
believe, is that the interaction modeled by LRM is 
restricted to the linear form, whereas PBI is able to 
capture nonlinear and complicated interaction pat- 
terns. To confirm our hypothesis, we further analyzed 



the SNP rsl7206492, which was identified by PBI 
(using any of the 3 permutation strategies) to have 
strong Gx Medicine interaction effect on DBP, but was 
not selected by LRM. The left panel of Figure 1 shows 
that the averaged values of DBP in individuals not car- 
rying the minor allele (genotype 0) and in individuals 
carrying the minor allele (genotype 1) are almost the 
same, indicating that rsl7206492 does not have strong 
marginal effect. However, with the increase of medica- 
tion usage, when the genotype is 1 (middle panel of 
Figure 1), DBP first decreases and then increases; but 
when the genotype is 0 (right panel of Figure 1), DBP 
first increases and then decreases. This nonlinear inter- 
action scheme cannot be detected by LRM, but is cap- 
tured by our model-free test statistic PBI. 

Effect of different permutation strategies 

There are 20 pedigrees in the GAW18 data set. Both 
the analysis of variance (ANOVA) test and the nonpara- 
metric Kruskal-Wallis test indicate that the mean DBP 
values of different pedigrees are different, whereas the 
mean SBP values are the same (Table 4). When evaluat- 
ing the p values of PBI, we performed 3 types of per- 
mutation: global (GP), local (LP), and residual (RP) 
permutations. Both LP and RP adjust for familial relat- 
edness between individuals. For SBP, except for the 
environmental factor age, the results from 3 permuta- 
tion methods coincide substantially (see Table 3 and 
Figure 2), which is consistent with the conclusion from 
ANOVA and Kruskal-Wallis test. In contrast, for DBP, 
the results of GP are quite different from the results of 
LP or RP, especially when assessing the interaction 
effect with medicine (see Table 3 and Figure 2). In this 
situation, the results from LP or RP are more reliable 
because they take into account the family dependence 



Table 2 Partitions based on the summarized quantities of age, smoking status, or medicine 
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Table 3 Number of significant SNPs with p value less than 7.9*10 ^ * 
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Figure 1 GxE interaction effect of SNP rs17206492 and medicine. The marginal efTect of the genotype {left), the medication effect when 
genotype = 1 {middle), and the medication effect when genotype is 0 (rigtit). 



of the phenotype. In addition, LP tends to select more 
markers than RP; this may be because the data violate 
the assumption that {e,;} have the same distribution. 
Moreover, SNPs identified by LP and RP overlap con- 
siderably and the consistency of results from these two 
permutation strategies can be an indicator of true 
signal. 

Discussion 

In this paper, we have proposed a partition-based 
approach PBI to detect GxE interactions, which is non- 
parametric and model-free. The test statistic is derived 
from a partition-based measure /, and the interaction 
information score Igxe is defined as the difference 
between the total score It and the maximum of the 
marginal scores. Intuitively, if the genetic and the envir- 
onmental factors have strong interaction effect. If will 
be far greater than both marginal scores; hence /gx£ will 
be positive and large. If not. It will be no greater than 
at least 1 of the marginal scores. Therefore, Igxe evalu- 
ates the amount of influence of the GxE interactions on 
the phenotype. 



When applied to the real data set about hypertension 
provided by GAW18, PBI identified many more markers 
than the traditional linear regression method. Because 
our approach is model-free, it is able to capture compli- 
cated interaction patterns that are difficult to detect in 
linear model. The significance of Igxe is evaluated by 
permutation. LP and RP adjust effectively for the family 
dependence of the phenotype. Despite the fact that the 
proposed procedure selects more SNPs than linear 
regression, there is very little experimental evidence of 
GxE interactions for hypertension in the current litera- 
ture to verify our findings. Therefore, biological studies 
will be required to investigate our results. Modifications 
of PBI have successfully identified gene-gene interac- 
tions and constructed genetic networks for breast cancer 
[6] and rheumatoid arthritis [7]. Moreover, PBI can be 
extended to evaluate the interaction effects between rare 
variants and environmental factors. Because of the low 
frequencies of rare variants (<1%), we can apply a gene- 
based approach by collapsing rare variants in a gene 
[8-11] and creating partitions based on the collapsed 
information. 



Table 4 p Values for testing the pedigree dependence of 
SBP and DBP 

ANOVA test Kruskal-Wallis test 

SBP 0.155 0.433 

DBP 0.000625 0.0004226 
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Positions of Significant SNPs on Chromosome 3 for SBP 
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Figure 2 Positions of SNPs identified to have significant GxE interaction effects by PBI using different permutation strategies for both 
SBP and DBP on chromosome 3 
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